@  MIT 


massachusetts  institute  of  technology  —  artificial  intelligence  laboratory 

Investigating  Shape 
Representation  in  Area  V4 
with  HMAX:  Orientation  and 
Grating  Selectivities 

Minjoon  Kouh  and  Maximilian  Riesenhuber 


Al  Memo  2003-021 
CBCL  Memo  231 


September  2003 


Report  Documentation  Page 

Form  Approved 

0MB  No.  0704-0188 

Public  reporting  burden  for  the  collection  of  information  is  estimated  to  average  1  hour  per  response,  including  the  time  for  reviewing  instructions,  searching  existing  data  sources,  gathering  and 
maintaining  the  data  needed,  and  completing  and  reviewing  the  collection  of  information.  Send  comments  regarding  this  burden  estimate  or  any  other  aspect  of  this  collection  of  information, 
including  suggestions  for  reducing  this  burden,  to  Washington  Headquarters  Services,  Directorate  for  Information  Operations  and  Reports,  1215  Jefferson  Davis  Highway,  Suite  1204,  Arlington 

VA  22202-4302.  Respondents  should  be  aware  that  notwithstanding  any  other  provision  of  law,  no  person  shall  be  subject  to  a  penalty  for  failing  to  comply  with  a  collection  of  information  if  it 
does  not  display  a  currently  valid  0MB  control  number. 

1.  REPORT  DATE 

SEP  2003 

2.  REPORT  TYPE 

3.  DATES  COVERED 

00-09-2003  to  00-09-2003 

4.  TITLE  AND  SUBTITLE 

5a.  CONTRACT  NUMBER 

Investingating  Shape  Representation  in  Area  V4  with  HMAX: 

5b.  GRANT  NUMBER 

iciiiitiiuii  miu  VTiituiig 

5c.  PROGRAM  ELEMENT  NUMBER 

6.  AUTHOR(S) 

5d.  PROJECT  NUMBER 

5e.  TASK  NUMBER 

5f.  WORK  UNIT  NUMBER 

7.  PERFORMING  ORGANIZATION  NAME(S)  AND  ADDRESS(ES) 

Massachusetts  Institute  of  Technology, Artificial  Intelligence 

Laboratory, 77  Massachusetts  Avenue, Cambridge, MA, 02139 

8.  PERFORMING  ORGANIZATION 

REPORT  NUMBER 

9.  SPONSORING/MONITORING  AGENCY  NAME(S)  AND  ADDRESS(ES) 

10.  SPONSOR/MONITOR’S  ACRONYM(S) 

11.  SPONSOR/MONITOR’S  REPORT 
NUMBER(S) 

12.  DISTRIBUTION/AVAILABILITY  STATEMENT 

Approved  for  public  release;  distribution  unlimited 

13.  SUPPLEMENTARY  NOTES 

The  original  document  contains  color  images. 

14.  ABSTRACT 

15.  SUBJECT  TERMS 

16.  SECURITY  CLASSIFICATION  OF: 

17.  LIMITATION  OF 
ABSTRACT 

18.  NUMBER 
OF  PAGES 

15 

19a.  NAME  OF 
RESPONSIBLE  PERSON 

a.  REPORT 

unclassified 

b.  ABSTRACT 

unclassified 

c.  THIS  PAGE 

unclassified 

standard  Form  298  (Rev.  8-98) 

Prescribed  by  ANSI  Std  Z39-18 


Abstract 

The  question  of  how  shape  is  represented  is  of  central  interest  to  understanding  visual  processing  in  cortex. 
While  tuning  properties  of  the  cells  in  early  part  of  the  ventral  visual  stream,  thought  to  be  responsible  for 
object  recognition  in  the  primate,  are  comparatively  well  understood,  several  different  theories  have  been 
proposed  regarding  tuning  in  higher  visual  areas,  such  as  V4.  We  used  the  model  of  object  recognition  in 
cortex  presented  by  Riesenhuber  and  Poggio  (1999),  where  more  complex  shape  tuning  in  higher  layers  is 
the  result  of  combining  afferent  inputs  tuned  to  simpler  features,  and  compared  the  tuning  properties  of 
model  units  in  intermediate  layers  to  those  of  V4  neurons  from  the  literature.  In  particular,  we  investigated 
the  issue  of  shape  representation  in  visual  area  V 1  and  V4  using  oriented  bars  and  various  types  of  gratings 
(polar,  hyperbolic,  and  Cartesian),  as  used  in  several  physiology  experiments.  Our  computational  model 
was  able  to  reproduce  several  physiological  findings,  such  as  the  broadening  distribution  of  the  orienta¬ 
tion  bandwidths  and  the  emergence  of  a  bias  toward  non-Cartesian  stimuli.  Interestingly,  the  simulation 
results  suggest  that  some  V4  neurons  receive  input  from  afferents  with  spatially  separated  receptive  fields, 
leading  to  experimentally  testable  predictions.  However,  the  simulations  also  show  that  the  stimulus  set 
of  Cartesian  and  non-Cartesian  gratings  is  not  sufficiently  complex  to  probe  shape  tuning  in  higher  areas, 
necessitating  the  use  of  more  complex  stimulus  sets. 


Copyright  ©  Massachusetts  Institute  of  Technology,  2003 


This  report  describes  research  done  within  the  Center  for  Biological  &  Computational  Learning  in  the  Department  of  Brain  & 
Cognitive  Sciences  and  in  the  Artificial  Intelligence  Laboratory  at  the  Massachusetts  Institute  of  Technology. 

This  research  was  sponsored  by  grants  from:  Office  of  Naval  Research  (DARPA)  under  contract  No.  N00014-00-1-0907,  Na¬ 
tional  Science  Foundation  (ITR)  under  contract  No.  IIS-0085836,  National  Science  Foundation  (KDI)  under  contract  No.  DMS- 
9872936,  and  National  Science  Foundation  under  contract  No.  IIS-9800032. 

Additional  support  was  provided  by:  AT&T,  Central  Research  Institute  of  Electric  Power  Industry  Center  for  e-Business 
(MIT),  Eastman  Kodak  Company  DaimlerChrysler  AG,  Compaq,  Honda  R&D  Co.,  Ltd.,  ITRI,  Komatsu  Ltd.,  Merrill-Lynch, 
Mitsubishi  Corporation,  NEC  Fund,  Nippon  Telegraph  &  Telephone,  Oxygen,  Siemens  Corporate  Research,  Inc.,  Sumitomo 
Metal  Industries,  Toyota  Motor  Corporation,  Watch  Vision  Co.,  Ltd.,  and  The  Whitaker  Foundation.  M.R.  is  supported  by  a 
McDonnell-Pew  Award  in  Cognitive  Neuroscience. 


1 


1  Introduction 

The  ventral  visual  pathway,  from  primary  visual  cortex, 
VI,  to  inferotemporal  cortex,  IT,  is  considered  to  be  re¬ 
sponsible  for  objecf  recognifion  in  fhe  primafe  ("whaf 
pafhway").  In  VI,  neurons  fend  fo  respond  well  fo  ori- 
enfed  bars  or  edges.  Neurons  in  fhe  infermediafe  visual 
areas  are  no  longer  funed  fo  orienfed  bars  only,  buf  also 
show  responses  fo  ofher  forms  and  shapes,  af  a  level 
nof  found  in  primary  visual  corfex  [7,  10].  Finally,  in 
IT,  neurons  are  responsive  fo  complex  shapes  like  fhe 
image  of  a  face  or  a  hand  [4,  5, 10, 17]. 

Undersfanding  how  fhe  neural  populafion  represenfs 
shape  information  and  how  such  represenfafions  arise 
wifhin  fhe  corfex  is  one  of  fhe  main  objecfives  of  visual 
neuroscience.  Many  physiological  sfudies  have  used 
differenf  sefs  of  visual  stimuli  in  order  fo  idenfify  fhe 
underlying  neural  mechanisms.  However,  fhe  nonlin¬ 
ear  behaviors  of  fhe  neurons  in  higher  visual  areas  have 
made  if  difficulf  fo  defermine  fhe  cortical  compufafional 
mechanisms  for  increasing  shape  complexify:  Consid¬ 
ering  fhe  infinife  number  of  functions  fhaf  can  be  tiffed 
fo  fhe  limifed  sef  of  dafa  poinfs  (given  by  fhe  responses 
of  a  neuron  fo  a  sef  of  fesf  stimuli),  sfudies  fhaf  rely  on 
post  hoc  function  tiffing  are  doomed  fo  fail.  Rafher,  if  is 
essential  fo  have  an  a  priori,  biologically  plausible  com¬ 
pufafional  hypofhesis  of  how  more  complex  feafures 
are  builf  from  simple  feafures,  a  fheory  fhaf  provides 
fesfable  predicfions. 

In  fhis  paper,  we  use  fhe  HMAX  model  of  objecf 
recognifion  in  corfex  developed  by  Riesenhuber  and 
Poggio  [13],  which  has  been  shown  fo  successfully 
model  various  aspecfs  of  invarianf  objecf  recognifion  in 
corfex  (for  a  recenf  review,  see  [14]),  fo  provide  a  com¬ 
pufafional  hypofhesis  of  how  complex  feafures  in  V4 
can  be  builf  from  VI  cell  inpufs.  We  here  focus  our 
efforfs  on  undersfanding  fhe  responses  of  V4  neurons 
fo  bars,  Carfesian  and  non-Carfesian  grafing  stimuli,  as 
arising  from  a  combinafion  of  VI  complex  cell  inpufs. 

2  Methods 

2.1  The  HMAX  Model  of  Object  Recognition  in 
Cortex 

The  HMAX  model,  proposed  by  Riesenhuber  and  Pog¬ 
gio  [13],  is  composed  of  four  hierarchical  feed-forward 
layers,  labelled  as  SI,  Cl,  S2,  and  C2.  In  fhe  firsf  layer, 
SI,  a  stimulus  image  is  convolved  wifh  linear  fillers 
{e.g.,  difference  of  Gaussians  or  Gabor  fillers)  of  various 
orienfafions  and  sizes.  Af  fhe  Cl  layer,  fhe  responses 
from  SI  unifs  lying  wifhin  cerfain  spatial  and  scale 
ranges  are  pooled  over,  and  fhe  maximum  responses 
are  forwarded  fo  fhe  nexf  S2  layer.  Such  maximum- 
based  pooling  increases  robusfness  fo  duffer,  as  well 
as  invariance  fo  stimulus  franslafion  and  scaling  [13]. 
(Some  recenf  physiological  evidences  for  fhe  maximum 
operafion  wifhin  visual  corfex  can  be  found  in  [6,  8].) 


Af  fhe  S2  level,  responses  of  Cl  unifs  are  combined  info 
more  complex  feafures. 


@  ®  ® 


view-tuned  cells 

"complex  composite"  cells  (C2) 

"composite  feature"  cells  (S2) 

complex  cells  (C1) 


©!^®{S)  ©'(2)®(S  ©®®®  simple  cells  (SI) 

4 


weighted  sum 
MAX 


Figure  1:  Schematic  diagram  of  fhe  HMAX  model.  In 
fhe  sfandard  version,  SI  fillers  come  in  four  differenf 
orienfafions  (0°,  45°,  90°,  135°),  and  each  S2  unif  com¬ 
bines  four  adjacenf  Cl  afferenls  in  a  spafial  2x2  arrange- 
menf,  producing  a  fofal  of  256  (4^)  differenf  fypes  of 
S2  unifs.  Af  fhe  final  C2  layer,  unifs  perform  anofher 
max  pooling  operafion  over  all  fhe  S2  unifs  of  each 
fype,  yielding  fhe  256  oufpuf  unifs  of  fhe  HMAX  model, 
which  can  in  furn  provide  an  inpuf  fo  fhe  view-funed 
unifs  wifh  funing  properfies  as  found  in  inferofemporal 
corfex  [9,  13].  As  fhe  shape  funings  of  fhe  correspond¬ 
ing  S  and  C  cells  in  fhe  same  layers  are  very  similar,  we 
here  confine  ourselves  fo  an  analysis  of  fhe  shape  fun¬ 
ing  of  S  cells. 

The  receptive  field  of  a  model  unif  can  be  defined  as 
fhe  region  of  fhe  inpuf  sfimulus  fhaf  produces  an  excifa- 
fory  (posifive)  response  of  fhe  unif.  Due  fo  fhe  pooling 
operafion  in  fhe  C  layers  and  fhe  combinafion  of  affer- 
enfs  in  fhe  S  layers,  fhe  receptive  fields  become  progres¬ 
sively  larger,  going  from  fhe  lower  fo  fhe  higher  layers. 
In  fhe  currenf  model,  fhe  recepfive  field  of  an  S2  unif  is 
abouf  fwice  fhe  size  of  an  SI  recepfive  field,  correspond¬ 
ing  fo  fhe  neurons  in  fhe  fovea  [4].  For  example,  an  S2 
unif  fhaf  combines  2x2  Cl  unifs,  which  in  furn  pool  over 
9x9  SI  unifs  of  17-21  pixels  wifh  adjacenf  recepfive  field 
cenfers,  will  have  a  recepfive  field  widfh  of  38  pixels. 
The  C2  unifs,  which  pool  over  fhe  population  of  fhe  S2 
unifs,  have  fhe  biggesf  recepfive  field  size.  The  recep¬ 
five  field  size  of  an  S2  unif  can  be  adjusfed  by  using  dif¬ 
ferenf  pooling  ranges  or  differenf  feafure  combinafion 
schemes  (ofher  fhan  fhe  2x2  spafial  arrangemenf). 

Thus,  fhe  HMAX  model  performs  a  series  of 
weighfed-sum  femplafe-mafching  operafions  in  fhe  S- 
layers  and  maximum-pooling  operafions  in  fhe  C- 
layers  fhaf  progressively  build  up  feafure  complexify 


2 


and  invariance  to  scaling  and  translation,  respectively. 

2.2  Approach 

In  our  simulations,  we  measured  the  responses  of  the 
SI  and  the  S2  model  units  to  different  sets  of  stimuli 
(orienfed  bars  and  grafings).  The  baseline-subfracfed 
responses  were  measured,  where  fhe  baseline  was  de¬ 
fined  fo  be  fhe  response  fo  a  null  stimulus.  The  sim¬ 
ulation  procedures  and  stimuli  were  based  direcfly  on 
several  physiological  sfudies  of  fhe  macaque  monkeys 
[4,  5, 17],  so  fhaf  fhe  simulafion  resulfs  wifh  fhe  HMAX 
model  could  be  readily  compared  wifh  fhe  experimen- 
fal  dafa.  Fig.  2  illusfrafes  fhe  rafionale  behind  fhis  sfudy. 


Physiology  HMAX  Model 


II 

II 

II 

II 


I  E 

E  E 

II  E 
II  II 

II  II  V 
II II  n 
II  M  a 
II 11  a 
II  11  Ll 
II 11 11 
I  l1 
I  1 


i  K  li  fei 
K  K  K  K 
K  K 

I?  K  IS  IS 
^  IS  IS 


^  2  s 
^  s  s 
SI  ?  s 

SI  ?  9 


II 

II 

II 


n  SI  9  9  5 
il  n  9  9  9 


4  4  4 

4  44 
4  44 
444 
^  4 
4  4 
4  4 
4  4 
K  4 
15 
P! 

W. 


4 
4 
fA 
fA 
fA 

P5  Ta 

W.  R 


1  1  I 
4  4  fl 

4  4  ri 

4  4  ri 
4  n  ri 
rj  N  ri 
rj  n  ri 
n  ri  li 

ri  li  li 
Fi  fi 
R  ¥i 
R  R 


Ij 

li 

li 


Figure  3:  Examples  of  bar  sfimuli  af  varying  orienfa- 
fions  and  positions.  Each  square  corresponds  fo  fhe  re- 
cepfive  field  of  a  model  unif,  and  fhe  widfh  of  fhe  bars 
shown  here  is  equal  fo  25%  of  fhe  recepfive  field  size. 


Figure  2:  Experimenfal  paradigm:  The  goal  of  fhe 
modeling  sfudy  is  fo  invesfigafe  possible  compufafional 
mechanisms  underlying  fhe  increase  in  feafure  com¬ 
plexify  along  fhe  venfral  pafhway  from  VI  fo  V4  ("?"  in 
fhe  diagram).  To  fhis  end,  we  compare  fhe  responses  of 
fhe  model  unifs  corresponding  fo  fhe  neurons  in  VI  and 
V4,  wifh  fhe  physiological  dafa  using  fhe  same  sfimuli 
and  procedures. 

2.3  Stimuli 
2.3.1  Bars 

Our  experimenfal  procedure  for  fhe  orienfafion  selec- 
fivify  sfudy  followed  fhaf  of  Desimone  and  Schein  [4]  as 
closely  as  possible.  The  sfimuli  were  fhe  images  of  bars 
af  varying  orienfafions  (0°  fo  180°  af  10°  infervals)  and 
widfhs  (1,  5,  10,  15,  20,  25,  30,  50,  and  70%  of  fhe  re¬ 
cepfive  field  size).  The  bars  were  always  long  enough 
fo  cover  fhe  whole  recepfive  field  and  presenfed  af  dif- 
ferenf  locafions  across  fhe  recepfive  field,  as  shown  in 
Fig.  3. 

The  orienfafion  funing  curve  of  a  model  unif  was  ob- 
fained  by  firsf  finding  fhe  preferred  widfh  of  fhe  bar 
sfimulus  and  fhen  measuring  fhe  maximum  (baseline- 
subfracfed)  responses  over  differenf  bar  positions  af 
each  orienfafion.  The  orienfafion  bandwidfh  was  de¬ 
fined  as  a  full  widfh  af  half  maximum  wifh  linear  in- 
ferpolafion.  Figs.  6  and  7  show  examples  of  orienfafion 
funing  curves. 

Again  following  fhe  convenfion  used  in  [4],  fhe  con- 
frasf  of  fhe  bar  image  was  defined  as  fhe  luminance  dif¬ 
ference  befween  fhe  bar  and  fhe  background,  divided 


by  fhe  background  luminance.  Throughouf  fhe  exper- 
imenf,  fhe  sfimulus  confrasf  was  fixed  af  90%.  (Orien¬ 
fafion  selecfivify  was  invarianf  for  a  wide  range  of  con- 
frasfs,  as  shown  in  Appendix  A.l.) 

2.3.2  Gratings 

Three  classes  of  grafings  (Carfesian,  polar,  and  hyper¬ 
bolic)  were  prepared  according  fo  fhe  following  equa- 


fions  (as  in  [5]).  For  Carfesian  grafings, 

Lc{x,y)  =  Ao  + AiCOs{2Trfu  +  0),  (1) 

u{x,y)  =  a;  cos  (/)  —  2/ sin  (/).  (2) 

For  polar  grafings, 

Lp{x,y)  =  Ao  + AiCOs{2TrfcC  +  2Trfrr  + 9),  (3) 

c{x,y)  =  \/a;2  +y'^,  (4) 

r{x,y)  =  tan“^  — .  (5) 

X 

For  hyperbolic  grafings, 

Lh{x,y)  =  Aq  +  Aicos(2TTfy/^  +  9),  (6) 

u{x,y)  =  a;  cos  ^  —  y  sin  (7) 

v{x,y)  =  X  sin  (/) -I- y  cos  (/).  (8) 

The  confrasf  of  fhe  grating  sfimuli  was  defined  by 

^  ,  ,  Lmax  I^min  /r.\ 

Confrasf  =  - - - - .  (9) 

Amax  “t”  Ajjiin 


The  mean  value  of  fhe  grating  (Aq)  was  sef  fo  a  nonzero 
consfanf,  and  ifs  amplifude  of  modulafion  (Ai)  was  ad- 
jusfed  fo  fif  fhe  confrasf  of  90%. 


3 


Cartesian 


Polar 


Hyperbolic 


III  III  fA 


##### 


W  WWW' 


#0v%i|  II#/# 


#### 


=  ^ ^  nil 


#  #  # 


#®@o 


>3 ijr 


Figure  4:  Grating  stimuli  (30  Cartesian,  40  polar,  and  20  hyperbolic  gratings)  as  used  in  [5]. 


These  gratings  were  presented  within  the  receptive 
field  of  a  model  unit  at  varying  phases  9,  in  steps  of 
120°  and  180°  (as  in  [5]),  and  the  baseline-subtracted 
maximum  responses  were  calculated. 

3  Results 

3.1  Orientation  Selectivity 

3.1.1  (VI,  SI) 

Neurons  in  visual  area  VI  exhibit  varying  degrees 
of  orientation  selectivity.  The  upper  left  histogram  in 
Fig.  8  shows  the  distribution  of  orientation  bandwidth 
in  VI  (from  [17]).  The  median  is  42°,  while  the  median 
of  the  oriented  cells  alone  (bandwidth  <  90°)  is  37°. 
These  results  are  summarized  in  Table  1. 


Experiment 

HMAX 

Bandwidths 

VI 

V4 

SI 

S2 

Median: 

All  neurons 

42° 

75° 

39° 

77° 

Less  than  90° 

37° 

52° 

39° 

59° 

Percentage: 

Narrow  (<  30°) 

27% 

5% 

0% 

0% 

Wide  (>  90°) 

15% 

33% 

0% 

11% 

to  provide  a  good  approximation  to  the  experimental 
data  in  VI  [3, 15].  A  Gabor  filter  is  defined  as 


G{x,  y)  =  exp 


y"^  \  cos{kx  —  (j)) 

20-2 )  27rax(7y 


(10) 


By  varying  <7y,  and  the  wave  number  k,  the  prop¬ 
erties  of  the  Gabor  filter  can  be  adjusted  [16].  The  fol¬ 
lowing  parameters  are  used:  Spatial  phase  (j)  =  Q,  so 
that  the  peak  is  centered.  Spatial  aspect  ratio  (x  vs.  y), 
'^xl<yy  =  0.6.  The  extent  in  the  x  direction,  Ux  =  1/3 
of  the  receptive  field.  The  wave  number  k  =  2.1  •  27r-. 
In  this  neighborhood  of  k,  there  are  two  inhibitory  sur¬ 
roundings  and  one  excitatory  center  as  seen  in  Fig.  5. 
These  parameters  were  chosen  to  produce  a  median 
bandwidth  of  39°,  close  to  the  median  of  the  VI  band- 
widths. 


Figure  5:  Gabor  filters  with  circular  receptive  field  at 
four  different  orientations.  The  circular  masking  was 
applied  to  reduce  numerical  differences  between  the  fil¬ 
ters  at  different  orientations. 


Table  1:  Summary  of  the  physiological  data  (VI,  V4) 
and  the  simulation  results  (SI,  S2).  The  experimental 
data  were  taken  from  [4, 17]. 

In  the  original  HMAX  model  [13],  each  SI  feature 
was  modeled  as  a  difference  of  Gaussians.  However, 
these  features  turn  out  to  have  an  orientation  band¬ 
width  much  broader  (approximately  90°)  than  found  in 
the  experiment  [16],  and  the  Gabor  filters  were  shown 


A  Gabor  filter  produces  an  optimal  response  when 
the  bar  stimulus  is  oriented  along  the  same  direction  as 
the  filter  itself.  The  SI  unit  shown  in  Fig.  6  prefers  the 
bar  oriented  at  0°  with  an  orientation  bandwidth  of  34°. 
For  a  given  set  of  parameters  {ox,  Oy,  and  k),  the  orienta¬ 
tion  tuning  curves  of  the  Gabor  filters  at  different  sizes 
are  almost  identical  to  one  another.  (See  the  upper  right 
histogram  in  Fig.  8.)  Therefore,  in  our  model,  the  dis¬ 
tribution  of  the  orientation  bandwidths  in  the  SI  layer 


4 


Figure  6:  Tuning  curve  of  an  SI  unit  with  a  Gabor-like 
receptive  field.  Because  of  the  reflective  symmetry  of 
the  bar  stimuli,  the  data  for  180°-360°  are  identical  to 
the  data  for  0°-180°. 


is  very  sharply  peaked  around  a  single  value.  How¬ 
ever,  as  shown  in  the  following  section,  even  from  this 
extremely  homogeneous  SI  population,  the  2x2  feature 
combination  at  the  S2  layer  can  create  a  wide  variety  of 
model  units  with  different  orientation  bandwidths. 

3.1.2  (V4,  S2) 

Moving  from  VI  to  V4,  the  receptive  field  size  in¬ 
creases,  and  neurons  respond  more  to  shapes  of  inter¬ 
mediate  complexity  [4,  5,  10,  17].  It  is  not  clear  exactly 
how  and  why  neurons  in  V4  behave  differently  from 
those  in  VI. 

In  the  standard  HMAX  model,  because  the  afferents 
for  the  S2  units  are  systematically  combined  in  a  2x2 
arrangement  of  Cl  afferents,  every  S2  unit  can  be  cate¬ 
gorized  according  to  its  geometric  configuration  of  the 
four  afferents,  as  shown  in  Table  2.  Such  a  classifica¬ 
tion  scheme  turns  out  to  be  a  meaningful  tool  for  un¬ 
derstanding  the  behavior  of  the  S2  population  to  the  bar 
stimuli. 

For  example,  each  orientation  tuning  curve  in  Fig.  7, 
typical  of  each  class,  shows  that  the  responses  to  the  bar 
stimuli  depend  strongly  on  how  the  afferent  features 
are  geometrically  combined.  Some  model  units,  whose 
afferents  are  aligned  in  the  same  orientations,  have  very 
simple  unimodal  tuning  curves  resembling  that  of  an  SI 
unit  (group  8).  For  others  (group  2-7),  the  tuning  curves 
show  multiple  peaks  at  different  orientations.  Those 
in  group  1,  whose  afferents  are  at  orthogonal  or  non¬ 
parallel  orientations  to  one  another,  exhibit  little  or  no 
orientation  tuning. 

As  a  result,  S2  units  with  similar  feature  configura¬ 
tion  tend  to  have  similar  orientation  bandwidths,  as 
seen  in  Fig.  8.  The  S2  units  in  group  6,  7  and  8  have 
narrow  bandwidths  around  40°.  Group  1  has  an  ex¬ 
tremely  broad  orientation  tuning  profile  due  to  its  non¬ 
parallel,  orthogonal  afferents.  The  orientation  band- 
widths  of  group  3  and  4  are  quite  variable  because  of 
the  secondary  peaks:  When  those  secondary  peaks  are 
small,  only  the  primary  peak  contributes  to  the  orien¬ 


tation  bandwidth.  Otherwise,  the  secondary  peaks  are 
merged  with  the  primary  peak  to  yield  larger  orienta¬ 
tion  bandwidths.  Thus,  by  adjusting  the  model  param¬ 
eters  that  influence  the  sharpness  and  the  relative  size 
of  the  response  peaks,  it  is  possible  to  obtain  different 
bandwidth  distributions.  In  general,  the  distributions 
are  upper  bounded  by  group  1  with  the  flat  orientation 
tuning  profiles  and  lower  bounded  by  group  6,  7,  and 
8. 

Fig.  8  and  Table  1  summarize  one  particular  simula¬ 
tion  result  that  produced  a  reasonable  approximation 
to  the  physiological  data.  (See  Appendix  for  the  results 
using  different  sets  of  model  parameters.)  Note  that  on 
average,  V4  neurons  and  S2  units  tend  to  have  wider 
orientation  bandwidths  than  VI  and  SI  units.  With  a 
median  bandwidth  of  75°,  V4  neurons  have  wider  ori¬ 
entation  bandwidths  than  VI  neurons.  In  the  model, 
there  is  a  sizable  increase  in  the  population  of  cells  with 
wider  bandwidths.  The  actual  percentage  values  are 
not  very  close  to  the  physiological  data,  since  the  SI 
population  is  too  simple  and  homogeneous.  (Only  11% 
of  the  S2  units  in  the  current  model  are  broadly  tuned, 
whereas  in  V4,  33%  of  the  neurons  have  wide  band- 
widths.)  However,  as  seen  in  the  next  section,  the  model 
can  cover  a  broad  range  of  bandwidth  distributions.  By 
including  a  population  of  broadly  tuned  SI  units,  the 
S2  layer  will  likely  show  more  realistic  distribution  of 
orientation  bandwidths. 

3.1.3  (VI,  SI)  ^  (S2,  V4) 

The  broadening  of  the  orientation  tuning  from  SI  to 
S2  layer  is  observed  over  a  wide  range  of  the  model  pa¬ 
rameter  values.  In  particular,  the  Gabor  wave  number 
k  has  a  strong  influence  on  both  SI  and  S2  bandwidths. 
Fig.  9  shows  the  changing  shapes  of  the  Gabor  filter  at 
different  k  values. 

As  k  increases,  SI  orientation  bandwidth  monotoni- 
cally  decreases.  The  orientation  bandwidths  of  the  S2 
units  also  change,  but  rather  disproportionally,  as  seen 
in  Fig.  10.  As  explained  before,  for  the  S2  units  in  group 
3  and  4,  the  secondary  peaks  in  the  orientation  tun¬ 
ing  profile  can  become  significant  enough  and  merge 
with  the  primary  peaks  to  yield  larger  orientation  band- 
widths.  When  the  SI  bandwidths  get  larger,  the  neigh¬ 
boring  peaks  in  the  S2  tuning  profile  are  more  likely  to 
overlap,  resulting  in  the  sharp  increase  of  the  orienta¬ 
tion  bandwidths  in  Fig.  10. 

Furthermore,  Fig.  10  shows  that  with  a  homogeneous 
population  of  SI  units,  it  is  possible  to  consistently  con¬ 
struct  a  distribution  of  the  S2  units  with  wider  orien¬ 
tation  bandwidths.  Also  note  that  the  HMAX  model 
can  cover  a  wide  range  of  orientation  bandwidths  in 
the  S2  layer.  Then,  by  incorporating  a  population  of 
more  broadly  tuned  SI  units,  a  larger  percentage  of  S2 
units  would  have  a  broad  orientation  tuning,  yielding 
an  even  better  fit  to  the  experimental  data.  Thus,  the 


5 


Group  Example  Number  Afferent  Configuration 


m 

4 

All  4  in  the  same  orientation. 

m 

32 

3  in  the  same  orientation,  and  the  other  at  non- 
orthogonal  orientation. 

H 

16 

3  in  the  same  orientation,  and  the  other  at  orthogo¬ 
nal  orientation. 

// 

24 

2  in  the  same  orientation,  and  the  other  2  in  the  same 
orientation  that  is  non-orthogonal  to  the  first  2. 

-/ 

96 

2  in  the  same  orientation,  and  the  other  2  at  different 
and  non-orthogonal  orientations  to  each  other. 

/\ 

48 

2  in  the  same  orientation,  and  the  other  2  at  different 
and  orthogonal  orientations  to  each  other. 

■ 

12 

2  in  the  same  orientation,  and  the  other  2  in  the  same 
orientation  that  is  orthogonal  to  the  first  2. 

/\ 

24 

All  4  in  different  orientations. 

Table  2:  8-class  classification  scheme  for  the  256  S2  units.  In  the  Example  column,  the  four  characters  represent 
the  possible  orientations  of  the  afferent  Cl  units.  The  2x2  geometric  configuration  was  written  as  a  1x4  vector  for 
notational  convenience.  The  Number  column  shows  the  number  of  S2  units  belonging  to  each  class. 


Group  1  Group  2  Group  3  Group  4 


Group  5  Group  6  Group  7  Group  8 


Figure  7:  Sample  tuning  curves  of  S2  units:  The  S2  units  in  group  1  (cf.  Table  2)  do  not  respond  much  to  the  bar 
stimuli,  yielding  a  flat  tuning  curve.  Group  2  shows  a  sharp  bimodal  tuning,  whereas  in  group  5,  two  peaks  are 
merged  to  give  a  larger  orientation  bandwidth.  Groups  3  and  4  have  a  large  node  and  two  small  nodes,  while  group 
6  and  7  have  one  large  node  and  one  small  node,  according  to  the  geometric  configuration  of  the  afferents.  Group  8 
has  a  sharp,  unimodal  tuning  curve.  These  tuning  curves  represent  typical  results  for  each  group. 


increase  in  orientation  bandwidth  from  VI  to  V4  found 
in  the  experiments  can  be  explained  as  a  byproduct  of 
cells  in  higher  areas  combining  complex  cell  afferents. 


3.2  Grating  Selectivity 
3.2.1  (VI,  SI) 

Neurons  in  visual  area  VI  are  known  to  be  most  re¬ 
sponsive  to  bar-like  or  Cartesian  stimuli,  even  though 


6 


V4  (Desimone  and  Schein  1987) 


10  20  30  40  50  60  70  80  90  90-180  >180 

Orientation  Bandwidth 


S2 


Figure  8:  Distributions  of  the  orientation  bandwidths  from  fhe  physiological  dafa  (VI  and  V4,  faken  from  [4,  17]) 
and  from  fhe  simulation  resulfs  (SI  and  S2).  The  legend  in  fhe  lower  righf  hisfogram  shows  fhe  8-class  classification 
scheme  given  in  Table  2. 


1.35  1.50  1.65  1.80  1.95  2.10  2.25  2.40  2.55  2.70  2.85 


Figure  9:  Gabor  fillers  wifh  varying  wave  numbers.  From  left  fo  righf,  wave  number  k  is  increased  from  1.35  fo 
2.85  in  unifs  of  27r.  The  cenfral  excifafory  region  becomes  narrower,  and  fhe  orienfafion  bandwidfh  decreases,  going 
from  left  fo  righf. 


fhere  appears  fo  be  a  small  population  of  VI  cells  more 
responsive  fo  non-Carfesian  stimuli  [10].  In  our  model, 
fhe  SI  population  is  quife  homogeneous  and  clearly 
shows  a  bias  toward  Carfesian  stimuli,  as  shown  in 
Fig.  11. 

3.2.2  (V4,  S2) 

Using  fhree  difterenf  classes  of  gratings  as  shown  in 
Fig.  4,  Gallanf  et  al.  [5]  reporfed  fhaf  fhe  majorify  of 
neurons  in  visual  area  V4  gave  comparable  responses 
(wifhin  a  factor  of  2)  fo  fhe  mosf  effective  member  of 
each  class,  while  fhe  mean  responses  fo  fhe  polar,  hy¬ 
perbolic,  and  Carfesian  gratings  were  11.1,  10.0,  and 
8.7  spikes  /  second  respectively,  as  summarized  in  Ta¬ 
ble  3.  Furfhermore,  fhere  was  a  populafion  of  neurons 
highly  selective  fo  non-Carfesian  gratings.  Ouf  of  103 
neurons,  fhere  were  20  fhaf  gave  more  fhan  fwice  fhe 
peak  responses  fo  one  sfimulus  class  fhan  fo  anofher:  10 
showed  a  preference  for  fhe  polar,  8  for  fhe  hyperbolic, 
and  2  for  Carfesian  gratings,  as  shown  in  Table  4. 

When  fhe  HMAX  model  (wifh  fhe  same  sef  of  param- 
efers  used  in  fhe  orienfafion  selecfivify  sfudies)  is  pre- 


Gallanf  et  al. 

HMAX 

Polar 

11.1 

0.14  ±  0.07 

Hyperbolic 

10.0 

0.15  ±  0.06 

Carfesian 

8.7 

0.05  ±  0.04 

Table  3:  Mean  responses  fo  fhree  difterenf  classes 
of  grafings.  Physiological  dafa  are  in  unifs  of 
spikes/ second,  whereas  fhe  model  responses  (baseline- 
subfracfed)  lie  befween  0  and  1.  Even  fhough  fhe  literal 
comparison  of  fhe  numerical  value  is  meaningless,  fhe 
model  unifs  and  fhe  neurons  bofh  show  a  clear  bias  fo- 
ward  non-Carfesian  grafings. 

senfed  wifh  fhe  same  sef  of  grafings,  fhe  S2  populafion 
exhibifs  a  similar  bias  foward  non-Carfesian  grafings, 
as  summarized  in  Tables  3  and  4. 

Fig.  11  shows  fhaf  fhere  is  a  general  frend  away  from 
fhe  Carfesian  sector,  confirming  fhe  bias  foward  non- 
Carfesian  sfimuli.  A  small  populafion  of  fhe  S2  unifs  re¬ 
sponds  significanfly  more  fo  one  class  of  sfimuli  fhan  fo 
anofher,  as  illusfrafed  by  Table  4  and  by  fhe  dafa  poinfs 


7 


Figure  10:  Top:  Median  orientation  bandwidths  of  SI 
and  S2  units  vs.  the  Gabor  wave  number  k,  plotted  in 
units  of  27r.  Boffom:  Same  dafa,  ploffed  as  SI  band- 
widfh  vs.  S2  bandwidfh.  The  dashed  line  represenfs  fhe 
condifion  where  SI  bandwidfh  =  S2  bandwidfh. 


Gallanf  et  al. 

HMAX 

Polar 

10% 

10% 

Hyperbolic 

8% 

5% 

Carfesian 

2% 

0% 

Figure  11:  Responses  fo  fhe  fhree  grating  classes  (po¬ 
lar,  hyperbolic,  and  Carfesian  grafings),  drawn  in  fhe 
same  convenfion  as  in  Fig.  4  of  [5].  For  each  model 
unif,  fhe  maximum  responses  wifhin  each  grafing  class 
are  freafed  as  a  3-dimensional  vector,  normalized  and 
ploffed  in  fhe  posifive  orfhanf.  This  3-dimensional  plof 
is  viewed  from  fhe  (1, 1, 1) -direction,  so  fhaf  fhe  ori¬ 
gin  will  correspond  fo  a  neuron  whose  maximum  re¬ 
sponses  fo  fhree  grafing  classes  are  identical.  Carfesian- 
preferring  unifs  will  lie  in  fhe  upper  sector,  polar  in 
fhe  lower  leff,  and  hyperbolic  in  fhe  lower  righf  sec- 
for.  The  symbols  oufside  of  fhe  inner  region  correspond 
fo  fhe  model  unifs  fhaf  gave  significanfly  greater  (by  a 
facfor  of  2)  responses  fo  one  sfimulus  class  fhan  fo  an- 
ofher.  The  size  of  each  symbol  reflecfs  fhe  maximum 
response  obfained  across  fhe  entire  stimuli.  Nofe  fhaf 
all  SI  unifs  prefer  Carfesian  over  polar  and  hyperbolic 
grafings,  whereas  mosf  S2  dafa  poinfs  lie  in  fhe  lower 
parf  of  fhe  plof,  indicafing  a  general  bias  foward  non- 
Carfesian  grafings. 


Table  4:  Percenfage  of  cells  fhaf  gave  more  fhan  fwice 
fhe  peak  responses  fo  one  sfimulus  class  fhan  fo  an- 
ofher. 


lying  oufside  of  fhe  inner  region  in  Fig.  11.  Nofe  fhaf 
fhe  proporfions  of  fhe  cells  preferring  non-Carfesian 
grafings  in  model  and  in  experimenf  agree  surprisingly 
well.  While  fhere  is  no  S2  cell  preferring  Carfesian  graf¬ 
ings  in  fhe  sfandard  version  of  HMAX,  fhis  is  nof  a  fun- 
damenfal  shorfcoming  of  fhe  model  —  S2  unifs  fhaf  re¬ 
ceive  inpuf  from  a  single  Cl  unif  would  show  fhe  re¬ 
quired  preference  for  Carfesian  grafings. 

The  rafio  of  fhe  maximum  and  fhe  minimum  re¬ 
sponses  fo  fhree  grafing  classes  shows  fhaf  mosf  S2 
unifs  (82%,  very  close  fo  fhe  esfimafe  of  80%  in  [5])  re¬ 
spond  fo  all  fhree  fypes  of  grafings  comparably  (wifhin 
a  facfor  of  fwo)  as  seen  in  Fig.  12.  However,  for  a  small 
fraction  of  cells,  fhis  maximum-over-minimum  rafio  ex¬ 
ceeds  2,  indicafing  an  enhanced  selecfivify  foward  one 
class  of  sfimuli.  In  particular,  fhe  S2  unifs  in  group  1 
(Table  2)  sfand  ouf  in  fhe  disfribufion,  since  fhey  re¬ 
spond  weakly  fo  Carfesian  sfimuli,  buf  sfrongly  enough 


1:  - 1 A 
2:11-- 
3:  I  I  A 


Figure  12:  Disfribufion  of  fhe  response  rafio  (maximum 
over  minimum)  fo  fhree  grafing  classes.  The  rafio  of 
1  indicafes  fhaf  fhe  cell  gave  fhe  same  maximum  re¬ 
sponses  fo  all  fhree  grafing  classes. 


fo  non-Carfesian  sfimuli. 

Fig.  13  and  14  show  fhe  disfribufion  of  fhe  S2  unif 
responses,  along  wifh  fhe  8-class  classification  scheme 
(Table  2).  They  illusfrafe  fhaf  fhe  S2  unifs  in  group 


8 


Polar 


Figure  13:  Distribution  of  the  maximum  responses  to 
three  grating  classes:  Each  dot  corresponds  to  one  of 
fhe  256  S2  unifs,  as  cafegorized  according  fo  fhe  8-class 
classificafion  scheme  along  fhe  x-axis.  Nofe  fhaf  fhe  dis- 
fribufion  for  Carfesian  grating  is  significanfly  differenf 
from  fhe  ofher  fwo  disfribufions.  Some  S2  unifs  fhaf  do 
nof  respond  much  fo  Carfesian  grafings  respond  well  fo 
non-Carfesian  grafings  (group  1  and  2),  and  vice  versa 
(group  8).  Thus,  in  visual  corfex,  fhe  Carfesian-selecfive 
cells  may  receive  afferenf  inpufs  from  fhe  cells  wifh  sim¬ 
ilar  orienfafion  selecfivities,  while  non-Carfesian  cell's 
afferenfs  would  be  composed  of  cells  wifh  differenf  ori¬ 
enfafion  selecfivities. 


8,  whose  afferenfs  are  poinfing  in  parallel  orienfafions, 
produce  large  responses  fo  Carfesian  grafings,  as  ex- 
pecfed.  On  fhe  ofher  end  of  fhe  specfrum,  fhe  S2 
unifs  in  group  1,  whose  pooled  afferenfs  are  selective 
fo  differenf  orienfafions,  show  higher  responses  fo  non- 
Carfesian  grafings. 

The  average  response  of  fhe  population  fo  each  graf- 
ing  is  plotted  in  Fig.  15(a),  where  fhe  bias  in  favor  of 
non-Carfesian  sfimuli  is  again  apparenf.  In  a  good  qual- 
ifafive  agreemenf  wifh  Figure  3-D  of  [5],  fhe  average 
population  responses  are  high  for  polar  and  hyperbolic 
grafings  of  low/infermediafe  frequencies.  Wifhin  fhe 
Carfesian  sfimulus  space,  fhe  average  response  is  also 
peaked  around  fhe  low/ infermediafe  frequency  region. 
The  concenfric  grating  of  low  frequency  (marked  wifh 
*)  shows  fhe  maximum  average  response.  For  reference. 


Fig.  15(b,c,d)  show  fhe  funing  curves  of  fhree  individual 
S2  unifs  fhaf  are  mosf  selecfive  fo  each  grating  type. 

One  of  fhe  major  differences  befween  fhe  physiolog¬ 
ical  dafa  in  [5]  and  fhe  aforemenfioned  simulafion  re- 
sulfs  is  fhe  lack  of  fhe  S2  unifs  highly  selecfive  fo  one 
sfimulus  class  only.  (In  fhe  scatter  plof,  fhose  unifs 
would  lie  along  fhe  direction  of  (1,0,0),  (0,1,0),  or 
(0,0, 1).)  In  facf,  as  seen  in  Fig.  11(b),  mosf  of  fhe  S2 
unifs  lie  near  fhe  boundary  befween  fhe  polar  and  fhe 
hyperbolic  sectors,  meaning  fhey  respond  quite  simi¬ 
larly  fo  fhese  grafings,  buf  differenfly  fo  Carfesian  graf¬ 
ings.  Fig.  13  indeed  shows  fhaf  fhe  response  disfribu¬ 
fions  for  fhe  polar  and  fhe  hyperbolic  grafings  are  quite 
similar. 

The  above  resulf  fheretore  suggesfs  fhaf  fhe  2x2  ar- 
rangemenf  of  fhe  Gabor-like  feafures  may  be  foo  sim¬ 
plistic,  possibly  because  fhe  sampling  of  fhe  adjacenf  af¬ 
ferenfs  is  foo  correlafed  fo  confain  enough  distinguish¬ 
ing  feafures  across  fhe  polar-hyperbolic  dimension.  The 
HMAX  model  can  be  exfended  fo  invesfigafe  fhese  is¬ 
sues.  The  feafure  complexify  can  be  increased  by  us¬ 
ing  differenf  combination  schemes  {e.g.,  3x3)  or  by  sam¬ 
pling  fhe  afferenfs  from  non-adjacenf  regions.  Fig.  16 
illusfrafes  fhaf  by  infroducing  such  modificafions,  if 
is  possible  fo  obfain  more  uniformly  disfribufed  re¬ 
sponses  in  fhe  polar-hyperbolic-Carfesian  space,  while 
mainfaining  a  general  bias  foward  non-Carfesian  sfim¬ 
uli.  This  resulf  suggesfs  fhaf  combining  non-local,  less- 
correlafed  feafures  would  be  imporfanf  in  building  fea¬ 
fures  fhaf  can  disfinguish  objecf  classes  better  (in  fhis 
case,  polar  vs.  hyperbolic  grafings).  Interestingly  pre¬ 
liminary  dafa  [2],  indicating  fhaf  some  V2  recepfive 
fields  appear  fo  show  separafe  direcfional  subfields,  are 
compatible  wifh  fhis  hypofhesis  of  separafed  Cl  affer¬ 
enfs  fo  an  S2  recepfive  field. 

Using  more  Cl  afferenfs,  if  is  also  possible  fo  infro- 
duces  ofher  varianfs  of  S2  unifs  wifh  differenf  graf- 
ing  selecfivifies.  Using  3x3  feafure  combination  wifh  4 
differenf  orienfafions  yields  4®  =  262144  possibilities. 
However,  by  increasing  fhe  number  of  fhe  afferenfs, 
fhe  bias  foward  non-Carfesian  grafing  is  also  increased, 
since  if  is  less  likely  fo  have  mosf  of  fhe  afferenfs  wifh 
fhe  same  orienfafion  selecfivifies. 

3.2.3  (VI,  SI)  (S2,  V4) 

Physiological  dafa  show  fhaf  along  fhe  venfral  pafh- 
way,  fhe  selecfivify  for  non-Carfesian  sfimuli  increases. 
Mahon  and  De  Valois  [10]  reported  fhaf  fhere  were 
more  neurons  responsive  fo  non-Carfesian  grafings  in 
V2  fhan  in  VI.  Gallanf  et  al.  [5]  reporfed  fhaf  fhe  selec¬ 
fivify  for  non-Carfesian  grafings  was  quite  enhanced  in 
fhe  visual  area  V4  and  fhaf  fhere  were  very  few  neurons 
highly  responsive  fo  Carfesian  grafings  only. 

A  similar  frend  is  apparenf  in  our  model,  or  rafher 
if  has  been  implicifly  builf  info  if,  by  combining  ori- 
enfed  fitters  (nafurally  responsive  fo  Carfesian  grafings) 


9 


Group  5  Group  6  Group  7  Group  8 

Figure  14:  When  all  256  S2  units  are  plotted  in  the  same  format  as  Fig.  11,  it  is  apparent  that  group  1  and  2  are 
composed  of  highly  non-Carfesian  unifs,  while  fhe  preference  for  Carfesian  sfimuli  slowly  increases  toward  group 
8. 


(a)  Mean  Response  (b)  Polar-selecfive  Cell 


Figure  15:  (a)  Average  populafion  responses  and  (b,c,d)  fhree  sample  toning  curves  mosf  selecfive  fo  each  of  fhe 
fhree  grating  classes.  The  responses  are  arranged  in  fhe  same  layouf  as  in  Fig.  4.  The  mosf  effecfive  stimulus  is 
marked  wifh  an  asferisk  (*)  on  fop. 


10 


nxn 


Distance  between  Cl  afferents  (increasing  to  the  right) 


Figure  16:  Population  responses  to  three  grating  classes,  from  100  S2  units  that  are  chosen  randomly  from  all  pos¬ 
sible  feafure  combinafions.  Top  row  shows  fhe  resulfs  wifh  fhe  2x2  feafure  combinafion  scheme,  and  fhe  bottom 
row  shows  fhe  3x3  scheme.  Going  from  left  fo  righf,  fhe  disfance  between  the  Cl  afferents  is  increased.  In  the 
first  column,  the  Cl  afferents  are  partially  (1  / 2)  overlapping.  In  the  second  column,  the  Cl  afferents  are  adjacent. 
(Thus,  the  plot  in  the  top  row  of  fhe  second  column  represenfs  fhe  resulf  using  fhe  sfandard  HMAX  paramefers.) 
In  fhe  fhird  and  fhe  fourfh  columns,  fhe  Cl  afterenfs  are  even  farfher  aparf  (1  or  2  fimes  fhe  Cl  pooling  range).  As 
fhe  disfance  befween  fhe  Cl  afterenfs  increases,  fhe  feafures  combined  in  one  S2  recepfive  field  are  sampled  from 
farfher  regions  of  fhe  sfimulus  image. 


info  non-parallel,  non-Carfesian  feafures.  In  fhe  SI 
layer,  fhere  is  no  model  unif  more  responsive  fo  non- 
Carfesian  grafings,  whereas  in  fhe  S2  layer  fhe  majorify 
prefers  non-Carfesian  grafings. 

4  Discussion 

In  fhis  paper,  a  compufafional  model  of  fhe  venfral  vi¬ 
sual  sfream  was  used  fo  provide  hypofheses  regarding 
possible  mechanisms  underlying  fhe  observed  change 
in  neuronal  feafure  funing  from  VI  fo  V4.  The  model 
posifs  fhaf  fhe  increase  in  complexify  resulfs  from  a  sim¬ 
ple  combinafion  of  complex  cell  afterenfs.  Despife  ifs 
simplicify,  fhe  model  fumed  ouf  fo  approximafe  sev¬ 
eral  physiological  dafa  along  fhe  venfral  pafhway  of  fhe 
primafe  visual  corfex.  In  particular,  fhe  model  exhib- 
ifed  fhe  broadening  of  fhe  orienfafion  bandwidfh  and 
fhe  bias  toward  non-Carfesian  stimuli,  while  success¬ 
fully  reproducing  some  of  fhe  populafion  sfafisfics.  In- 
feresfingly,  even  a  simple  2x2  combinafion  of  fhe  af- 
ferenfs  could  yield  a  fairly  complex  behavior  from  fhe 
populafion.  Furfhermore,  if  was  noted  fhaf  fhe  model 
unifs  whose  afterenfs  were  non-parallel  and  orfhogo- 
nal  served  fo  yield  a  wide  orienfafion  bandwidfh  and 
a  high  selecfivify  for  non-Carfesian  sfimuli. 


The  grafings  provided  a  richer  sef  of  sfimuli  and 
showed  some  discrepancies  befween  fhe  sfandard 
HMAX  model  and  fhe  physiological  dafa,  in  parficular 
fhe  lack  of  model  unifs  sfrongly  selective  for  eifher  po¬ 
lar  or  hyperbolic  grafings.  Such  model  unifs  could  only 
be  obfained  by  increasing  fhe  spatial  separation  of  fhe 
Cl  afterenfs.  This  provides  an  inferesfing  predicfion  for 
experimenfs  regarding  fhe  recepfive  field  subsfrucfure 
of  neurons  in  higher  visual  areas,  for  which  fhere  are 
some  preliminary  experimenfal  evidences  in  V2  [2].  In- 
feresfingly  feafures  based  on  spafially-separafed,  com¬ 
plex  cell-like  afterenfs  have  been  previously  posfulafed 
based  on  compufafional  grounds  [1].  An  alfernafive, 
more  frivial  way  fo  obfain  cells  sfrongly  selective  for 
non-Carfesian  grafings,  even  fhough  nof  explored  here, 
would  be  fo  assume  more  complex,  non-Carfesian  SI 
feafures  fhaf  are  more  selective  foward  fhe  feafures 
found  in  fhe  sfimuli  sef.  Physiological  dafa  indicafe  fhaf 
VI  does  confain  neurons  responsive  fo  radial,  concen- 
fric,  or  hyperbolic  grafings  [10]. 

Finally,  if  appears  fhaf  fhe  bar  and  grating  sfimuli 
are  foo  limited  as  a  sfimulus  sef  fo  provide  sfrong  con- 
sfrainfs  for  fhe  model,  as  fhe  sfandard  HMAX  model 
seemed  fo  have  enough  degrees  of  freedom  fo  cover 


11 


various  bandwidth  distributions  and  grating  selectiv- 
ities.  For  example,  the  present  data  did  not  require 
any  significant  modification  of  feature  combination 
schemes  or  the  inclusion  of  more  complex  features  in 
the  lower  layer  of  the  hierarchical  architecture.  It  will 
be  interesting  to  test  model  unit  responses  to  more  com¬ 
plex  stimuli,  such  as  the  contour  features  used  by  Pasu- 
pathy  and  Connor  [11,  12]:  While  the  tuning  of  model 
unit  is  currently  based  on  shape  only,  Pasupathy  and 
Connor  have  postulated  that  V4  neurons  show  evidence 
for  an  object-centered  reference  frame. 

A  Further  Orientation  Selectivity  Studies 

Most  of  the  main  results  in  this  paper  were  obtained 
with  the  following  standard  parameters  (adopted  from 
[16]). 


Parameter 

Value 

Stimulus  Contrast 

ns2 

Gabor  wave  number 

51  receptive  field  size 

52  receptive  field  size 

90% 

1.25 

27r-2.1 

17, 19,  21  pixels 
38  pixels 

In  this  appendix,  we  study  the  effects  of  these  param¬ 
eters  in  more  detail. 

A.l  Stimulus  Contrast 

The  luminance  of  a  stimulus  can  be  varied  in  several 
different  ways.  The  total  luminance,  the  sum  or  squared 
sum  of  all  pixel  values,  can  be  set  to  a  constant.  Al¬ 
ternatively,  the  background  luminance  or  the  minimum 
luminance  can  be  set  to  a  constant,  and  the  maximum 
luminance  can  be  adjusted  according  to  the  definition 
of  contrast  (Eqn.  9).  In  our  study  of  orientation  selectiv¬ 
ity,  the  background  of  the  stimulus  image  was  set  to  a 
constant  value  of  1,  and  the  luminance  of  the  bar  was 
adjusted. 

As  shown  in  the  following  table,  the  mean  orienta¬ 
tion  bandwidths  were  invariant  over  a  wide  range  of 
stimulus  contrasts.  However,  the  high  contrast  stimuli 
produced  higher  responses  from  the  SI  and,  thus,  the 
S2  units. 


Contrast 

Bandwidths 

Responses 

SI 

S2 

SI 

S2 

10% 

39.8° 

78.0° 

0.032 

0.013 

30% 

39.1° 

77.5° 

0.090 

0.037 

50% 

39.1° 

77.2° 

0.140 

0.059 

70% 

39.1° 

77.2° 

0.184 

0.078 

90% 

39.3° 

76.8° 

0.223 

0.095 

A.2  Orientation  Bandwidth 

The  distribution  of  the  orientation  bandwidth  is  in  gen¬ 
eral  lower  bounded  by  group  8  and  upper  bounded  by 
group  1.  Between  these  bounds,  the  S2  units  in  group  3 
and  4  have  the  most  variable  range  of  orientation  band- 
widths,  because  of  their  secondary  response  peaks  in 


the  orientation  tuning  curve.  Therefore,  by  manipu¬ 
lating  the  model  parameters  that  affect  the  tuning  pro¬ 
files,  it  is  possible  to  obtain  different  bandwidth  distri¬ 
butions.  Some  of  such  parameters  are  the  feature  sensi¬ 
tivity  ((JS2)/  receptive  field  geometry,  and  the  scale.  The 
sharpness  of  tuning  at  the  SI  level  (Gabor  wave  number 
k)  was  treated  in  section  3.1.3. 

A.2.1  (J52 

The  response  of  an  S2  unit  is  determined  by  the  affer¬ 
ent  Cl  units,  which  are  combined  as  a  product  of  Gaus- 
sians. 

5^2  =  e“Ei  (11) 

Each  Gaussian  is  centered  at  1,  since  the  Cl  responses 
lie  between  0  and  1. 

The  response  of  an  S2  unit,  or  the  sensitivity  to  a  fea¬ 
ture,  is  affected  by  a 32-  For  example,  for  a  large  value 
of  as2,  the  S2  unit  will  produce  a  fairly  large  response 
(close  to  1)  regardless  of  the  stimulus.  If  as2  is  small, 
the  S2  unit  will  only  respond  to  a  very  specific  feature, 
determined  by  the  afferent  Cl  units.  Then,  as  a 32  is  var¬ 
ied  from  a  small,  to  a  medium,  and  to  a  large  value,  the 
baseline-subtracted  response  of  an  S2  unit  will  go  from 
0,  to  an  intermediate  value,  and  to  0  again.  (In  the  limit 
of  a 32  <<  1,  the  S2  response  will  be  0,  unless  the  stim¬ 
ulus  is  the  optimal  feature.  If  a 32  >>  1,  the  S2  unit  will 
yield  the  maximum  response  1,  for  any  stimulus.  It  will 
also  respond  well  to  a  blank  stimulus,  and  therefore,  the 
baseline-subtracted  response  will  be  0  again.) 

The  orientation  selectivity  at  various  0-52  can  be  un¬ 
derstood  similarly.  For  a  large  a32,  all  orientations  of 
the  bar  stimulus  will  produce  similar  responses,  yield¬ 
ing  a  flat  tuning  curve.  This  will  be  especially  true  for 
the  S2  units  in  group  3  and  4,  whose  primary  and  sec¬ 
ondary  peaks  can  then  easily  merge  into  one  wide  peak. 
Therefore,  increasing  a 32  will  have  a  broadening  effect 
on  the  orientation  bandwidths,  as  seen  in  the  following 
Table  and  Fig.  17. 


ns2 

Median  Bandwidth 

0.5 

34.7° 

1.0 

52.2° 

1.5 

79.3° 

4.0 

82.8° 

A.2.2  Receptive  Field  Geometry 

For  all  the  simulations  described  in  this  paper,  the 
SI  units  were  given  a  circular  receptive  field,  in  order 
to  reduce  the  numerical  differences  between  the  princi¬ 
pal  (0°,  90°)  and  the  oblique  (45°,  135°)  orientations  and 
thus  to  emphasize  only  the  effects  coming  from  the  in¬ 
herent  architecture  {e.g.,  2x2  feature  combination)  of  the 
model. 

For  a  comparison,  the  circular  mask  was  lifted  from 
the  Gabor  filters,  thereby  giving  a  square  receptive  field 


12 


S2o=0.5 


BGroyps 


Figure  17:  Distribution  of  the  orientation  bandwidth  at 
various  a 32-  Note  that  the  bandwidths  are  most  vari¬ 
able  for  group  3  and  4.  Bandwidfhs  >  180°  are  shown 
as  180°. 


fo  fhe  SI  unifs.  Then,  fhe  bandwidfh  disfribufion  be¬ 
comes  smoofher  wifh  less  sharp  changes  from  one  his¬ 
togram  bin  fo  anofher,  resembling  fhe  physiological 
dafa  more.  This  effecf  seems  fo  arise  from  fhe  asym- 
mefry  of  fhe  SI  filfers,  and,  fheretore,  differences  be- 
fween  individual  VI  neurons  may  play  a  role  in  pro¬ 
ducing  a  broad,  smooth  distribution  of  fhe  orienfafion 
bandwidfhs  in  V4. 

The  Gabor  filfers  wifh  a  square  receptive  field 
have  a  slightly  sharper  orientation  tuning,  since  they 
have  more  elongated  excitatory  and  inhibitory  regions. 
However,  regardless  of  fhe  circularify  of  fhe  receptive 
field,  fhe  overall  shapes  of  fhe  timing  profiles  are  equal 
fo  whaf  is  shown  in  Fig.  7,  resulting  in  fhe  similar  broad¬ 
ening  frend  from  fhe  SI  fo  fhe  S2  layer. 

A.2.3  Scaling 

The  receptive  field  sizes  of  VI  and  V4  neurons  are 
widely  disfribufed.  In  general,  fhey  are  positively  corre- 
lafed  wifh  eccenfricify.  Af  fhe  fovea,  fhe  receptive  fields 
of  V4  neurons  are  on  average  fwice  as  large  as  fhose  of 
VI  neurons  [4].  When  fhe  orienfafion  selecfivify  exper- 


imenfs  are  performed  af  differenf  scales,  fhe  orienfafion 
bandwidth  again  increases  from  fhe  SI  fo  fhe  S2  layer. 
The  following  fable  summarizes  fhe  simulation  results, 
where  the  receptive  field  sizes  are  given  in  pixels. 


Recepfive  Field 

Bandwidth 

Scale 

SI 

S2 

SI 

S2 

1 

7,9 

16 

36.0° 

127.7° 

2 

11, 13, 15 

26 

39.3° 

81.4° 

3 

17, 19,  21 

38 

39.3° 

76.8° 

4 

23,  25,  27,  29 

52 

38.6° 

50.0° 

Fig.  18  explains  fhis  broadening  of  fhe  bandwidfhs 
af  higher  scales  (=  resolutions)  as  a  discretization  ef¬ 
fecf:  As  fhe  resolufion  becomes  finer  (going  from  fop 
fo  bottom),  fhe  primary  and  fhe  secondary  peaks  in  fhe 
orienfafion  funing  profiles  of  group  3  and  4  are  beffer 
disfinguished,  resulting  in  lower  median  bandwidths. 


Scale  Band! 


SGroyps 


Figure  18:  Distribution  of  fhe  orienfafion  bandwidfh  for 
fhe  S2  unifs.  Going  from  fop  fo  bottom,  fhe  recepfive 
field  size  (and  correspondingly  fhe  receptive  field's  res¬ 
olufion)  increases. 


B  Further  Grating  Selectivity  Studies 

B.l  Scaling 

Here,  fhe  behavior  of  fhe  model  af  four  differenf  scales 
was  sfudied  as  in  Section  A.2.3,  using  fhe  same  sef 


13 


of  polar,  hyperbolic,  and  Cartesian  gratings.  The  re¬ 
sult  shows  that  at  all  scales,  the  overall  response  dis¬ 
tributions  to  each  class  of  gratings  are  quite  simi¬ 
lar,  with  an  apparent  bias  toward  non-Cartesian  stim¬ 
uli.  The  following  table  summarizes  the  average 
baseline-subtracted  responses  and  the  standard  devia¬ 
tions,  which  are  almost  identical  across  all  four  scales. 


Responses 

Scale 

Polar 

Hyperbolic 

Cartesian 

1 

0.15  ±  0.05 

0.18  ±  0.04 

0.06  ±  0.03 

2 

0.15  ±  0.06 

0.16  ±  0.05 

0.05  ±  0.04 

3 

0.14  ±0.07 

0.15  ±  0.06 

0.05  ±  0.04 

4 

0.15  ±0.07 

0.16  ±  0.06 

0.05  ±  0.03 

Finally,  the  following  table  shows  the  breakdown  of 
the  population  within  three  grating  sectors.  The  val¬ 
ues  inside  the  parentheses  represent  the  percentage  of 
S2  units  whose  peak  response  to  one  grating  class  was 
twice  the  response  to  another.  Even  though  the  break¬ 
downs  for  the  polar  and  the  hyperbolic  gratings  are 
quite  variable  since  most  S2  units  lie  near  the  bound¬ 
ary,  an  overall  preference  for  non-Cartesian  stimuli  is 
again  apparent. 


Population  Statistics 

Scale 

Polar 

Hyperbolic 

Cartesian 

1 

36%  (  7%) 

53%  (8%) 

11%  (0%) 

2 

50%  (  7%) 

39%  (8%) 

11%  (0%) 

3 

57%  (10%) 

32%  (5%) 

11%  (0%) 

4 

52%  (10%) 

37%  (5%) 

11%  (0%) 

References 

[1]  Y.  Amit  and  D.  Geman.  Shape  quantization  and 
recognition  with  randomized  trees.  Neural  Compu¬ 
tation,  9:1545-1588, 1997. 

[2]  A.  Anzai  et  al.  Receptive  field  structure  of  mon¬ 
key  V2  neurons  for  encoding  orientation  contrast 
[abstract].  Journal  of  Vision,  2(7),  2002. 

[3]  R  Dayan  and  L.  Abbott.  Theoretical  Neuroscience: 
Computational  and  mathematical  modeling  of  neural 
systems.  MIT  Press,  2001. 

[4]  R.  Desimone  and  S.  Schein.  Visual  properties  of 
neurons  in  area  V4  of  the  macaque:  Sensitivity  to 
stimulus  form.  Journal  of  Neurophysiology,  57:835- 
868, 1987. 

[5]  J.  Gallant  et  al.  Neural  responses  to  polar,  hy¬ 
perbolic,  and  Cartesian  gratings  in  area  V4  of 
the  macaque  monkey.  Journal  of  Neurophysiology, 
76:2718-2739, 1996. 

[6]  T.  Gawne  and  J.  Martin.  Responses  of  primate 
visual  cortical  V4  neurons  to  simultaneously  pre¬ 
sented  stimuli.  Journal  of  Neurophysiology,  88:1128- 
1135, 2002. 

[7]  E.  Kobatake  and  K.  Tanaka.  Neuronal  selectivi- 
ties  to  complex  object  features  in  the  ventral  visual 


pathway  of  the  macaque  cerebral  cortex.  Journal  of 
Neurophysiology,  71:856-867, 1994. 

[8]  I.  Lampl  et  al.  The  MAX  operation  in  cells  in  the 
cat  visual  cortex  [abstract].  Society  for  Neuroscience 
Abstracts,  2001. 

[9]  N.  Logothetis  et  al.  Shape  representation  in  the  in¬ 
ferior  temporal  cortex  of  monkeys.  Current  Biology, 
5:552-563, 1995. 

[10]  L.  Mahon  and  R.  De  Valois.  Cartesian  and  non- 
Cartesian  responses  in  LGN,  VI,  and  V2  cells.  Vi¬ 
sual  Neuroscience,  18:973-981, 2001. 

[11]  A.  Pasupathy  and  C.  Connor.  Responses  to  contour 
features  in  macaque  area  V4.  Journal  of  Neurophys¬ 
iology,  82:2490-2502, 1999. 

[12]  A.  Pasupathy  and  C.  Connor.  Shape  representation 
in  area  V4:  Position-specific  tuning  for  boundary 
conformation.  Journal  of  Neurophysiology,  86:2505- 
2519, 2001. 

[13]  M.  Riesenhuber  and  T.  Poggio.  Hierarchical  mod¬ 
els  of  object  recognition  in  cortex.  Nature  Neuro¬ 
science,  2:1019-1025, 1999. 

[14]  M.  Riesenhuber  and  T.  Poggio.  Neural  mecha¬ 
nisms  of  object  recognition.  Current  Opinions  in 
Neurobiology,  12:162-168, 2002. 

[15]  D.  L.  Ringach.  Spatial  structure  and  symmetry  of 
simple-cell  receptive  fields  in  macaque  VI.  Journal 
of  Neurophysiology,  88:455-463, 2002. 

[16]  T.  Serre  and  M.  Riesenhuber.  Realistic  modeling  of 
cortical  cells  for  simulations  with  a  model  of  object 
recognition  in  cortex  [in  prep].  MIT  Al  Memo,  2003. 

[17]  R.  De  Valois  et  al.  The  orientation  and  direction 
selectivity  of  cells  in  macaque  visual  cortex.  Vision 
Research,  22:531-544, 1982. 


14 


